智能论文笔记

Causal Conceptions of Fairness and their Consequences

Hamed Nilforoshan , Johann Gaebler , Ravi Shroff , Sharad Goel

分类：机器学习 | 人工智能

2022-07-12

最近的工作突出了因果关系在设计公平决策算法中的作用。但是，尚不清楚现有的公平因果概念如何相互关系，或者将这些定义作为设计原则的后果是什么。在这里，我们首先将算法公平性的流行因果定义组装成两个广泛的家庭：（1）那些限制决策对反事实差异的影响的家庭；（2）那些限制了法律保护特征（如种族和性别）对决策的影响。然后，我们在分析和经验上表明，两个定义的家庭\ emph {几乎总是总是} - 从一种理论意义上讲 - 导致帕累托占主导地位的决策政策，这意味着每个利益相关者都有一个偏爱的替代性，不受限制的政策从大型自然级别中绘制。例如，在大学录取决定的情况下，每位利益相关者都不支持任何对学术准备和多样性的中立或积极偏好的利益相关者，将不利于因果公平定义的政策。的确，在因果公平的明显定义下，我们证明了由此产生的政策要求承认所有具有相同概率的学生，无论学术资格或小组成员身份如何。我们的结果突出了正式的局限性和因果公平的常见数学观念的潜在不利后果。

translated by 谷歌翻译

Regulating Gatekeeper AI and Data: Transparency, Access, and Fairness under the DMA, the GDPR, and beyond

Philipp Hacker , Johann Cordes , Janina Rochon

分类：人工智能

2022-12-09

Artificial intelligence is not only increasingly used in business and administration contexts, but a race for its regulation is also underway, with the EU spearheading the efforts. Contrary to existing literature, this article suggests, however, that the most far-reaching and effective EU rules for AI applications in the digital economy will not be contained in the proposed AI Act - but have just been enacted in the Digital Markets Act. We analyze the impact of the DMA and related EU acts on AI models and their underlying data across four key areas: disclosure requirements; the regulation of AI training data; access rules; and the regime for fair rankings. The paper demonstrates that fairness, in the sense of the DMA, goes beyond traditionally protected categories of non-discrimination law on which scholarship at the intersection of AI and law has so far largely focused on. Rather, we draw on competition law and the FRAND criteria known from intellectual property law to interpret and refine the DMA provisions on fair rankings. Moreover, we show how, based on CJEU jurisprudence, a coherent interpretation of the concept of non-discrimination in both traditional non-discrimination and competition law may be found. The final part sketches specific proposals for a comprehensive framework of transparency, access, and fairness under the DMA and beyond.

translated by 谷歌翻译

Misogyny classification of German newspaper forum comments

Johann Petrak , Brigitte Krenn

分类：自然语言处理 | 人工智能

2022-11-30

This paper presents work on detecting misogyny in the comments of a large Austrian German language newspaper forum. We describe the creation of a corpus of 6600 comments which were annotated with 5 levels of misogyny. The forum moderators were involved as experts in the creation of the annotation guidelines and the annotation of the comments. We also describe the results of training transformer-based classification models for both binarized and original label classification of that corpus.

translated by 谷歌翻译

Statistical treatment of convolutional neural network super-resolution of inland surface wind for subgrid-scale variability quantification

Daniel Getter , Julie Bessac , Johann Rudi , Yan Feng

分类：机器学习 | (统计)机器学习

2022-11-30

Machine learning models are frequently employed to perform either purely physics-free or hybrid downscaling of climate data. However, the majority of these implementations operate over relatively small downscaling factors of about 4--6x. This study examines the ability of convolutional neural networks (CNN) to downscale surface wind speed data from three different coarse resolutions (25km, 48km, and 100km side-length grid cells) to 3km and additionally focuses on the ability to recover subgrid-scale variability. Within each downscaling factor, namely 8x, 16x, and 32x, we consider models that produce fine-scale wind speed predictions as functions of different input features: coarse wind fields only; coarse wind and fine-scale topography; and coarse wind, topography, and temporal information in the form of a timestamp. Furthermore, we train one model at 25km to 3km resolution whose fine-scale outputs are probability density function parameters through which sample wind speeds can be generated. All CNN predictions performed on one out-of-sample data outperform classical interpolation. Models with coarse wind and fine topography are shown to exhibit the best performance compared to other models operating across the same downscaling factor. Our timestamp encoding results in lower out-of-sample generalizability compared to other input configurations. Overall, the downscaling factor plays the largest role in model performance.

translated by 谷歌翻译

Graph Neural Networks: A Powerful and Versatile Tool for Advancing Design, Reliability, and Security of ICs

Lilas Alrahis , Johann Knechtel , Ozgur Sinanoglu

分类：机器学习

2022-11-29

Graph neural networks (GNNs) have pushed the state-of-the-art (SOTA) for performance in learning and predicting on large-scale data present in social networks, biology, etc. Since integrated circuits (ICs) can naturally be represented as graphs, there has been a tremendous surge in employing GNNs for machine learning (ML)-based methods for various aspects of IC design. Given this trajectory, there is a timely need to review and discuss some powerful and versatile GNN approaches for advancing IC design. In this paper, we propose a generic pipeline for tailoring GNN models toward solving challenging problems for IC design. We outline promising options for each pipeline element, and we discuss selected and promising works, like leveraging GNNs to break SOTA logic obfuscation. Our comprehensive overview of GNNs frameworks covers (i) electronic design automation (EDA) and IC design in general, (ii) design of reliable ICs, and (iii) design as well as analysis of secure ICs. We provide our overview and related resources also in the GNN4IC hub at https://github.com/DfX-NYUAD/GNN4IC. Finally, we discuss interesting open problems for future research.

translated by 谷歌翻译

Deconfounded Imitation Learning

Risto Vuorio , Johann Brehmer , Hanno Ackermann , Daniel Dijkman , Taco Cohen , Pim de Haan

分类：机器学习 | (统计)机器学习

2022-11-04

Standard imitation learning can fail when the expert demonstrators have different sensory inputs than the imitating agent. This is because partial observability gives rise to hidden confounders in the causal graph. We break down the space of confounded imitation learning problems and identify three settings with different data requirements in which the correct imitation policy can be identified. We then introduce an algorithm for deconfounded imitation learning, which trains an inference model jointly with a latent-conditional policy. At test time, the agent alternates between updating its belief over the latent and acting under the belief. We show in theory and practice that this algorithm converges to the correct interventional policy, solves the confounding issue, and can under certain assumptions achieve an asymptotically optimal imitation performance.

translated by 谷歌翻译

SATViz: Real-Time Visualization of Clausal Proofs

Tim Holzenkamp , Kevin Kuryshev , Thomas Oltmann , Lucas Wäldele , Johann Zuber , Tobias Heuer , Markus Iser

分类：人工智能

2022-09-13

代表SAT实例的图表的视觉布局可以突出显示SAT实例的社区结构。SAT实例的社区结构与实例硬度和已知条款质量启发式方法有关。我们的工具SATVIZ使用可变交互图和强制定向的布局算法可视化CNF公式。借助SATVIZ，可以对条款证明进行动画，以连续突出最近学习子句的移动窗口中发生的变量。如果需要，Satviz还可以使用调整后的边缘权重创建可变交互图的新布局。在本文中，我们描述了Satviz的结构和特征集。我们还提出了一些使用Satviz创建的有趣的可视化。

translated by 谷歌翻译

On the Importance of Quantifying Visibility for Autonomous Vehicles under Extreme Precipitation

Clément Courcelle , Dominic Baril , François Pomerleau , Johann Laconte

分类：机器人

2022-09-07

在自动驾驶的背景下，车辆本质上肯定会遇到更多的极端天气，在此期间必须确保公共安全。随着气候迅速变化，大暴风雪的频率有望增加，并成为安全导航的主要威胁。尽管有许多文献旨在提高对冬季条件的导航弹性，但缺乏标准指标来量化与降水有关的LIDAR传感器的可见性丧失。本章提出了一个新颖的指标，以实时量化LIDAR可见性损失，并依赖气象研究领域的可见性概念。我们在加拿大不良驾驶条件（CADC）数据集上评估了该指标，将其与基于最先进的激光雷达的本地化算法的性能相关联，并评估在本地化过程之前过滤点云的好处。我们表明，迭代最接近的点（ICP）算法令人惊讶地抵抗降雪，但是突然的事件（例如雪地）可以极大地阻碍其准确性。我们讨论了此类事件，并证明需要更好地关注这些极端事件以量化其效果。

translated by 谷歌翻译

Annotated Dataset Creation through General Purpose Language Models for non-English Medical NLP

Johann Frei , Frank Kramer

分类：自然语言处理 | 人工智能 | 机器学习

2022-08-30

获得具有语义注释的文本数据集是一个艰苦的过程，但对于自然语言过程（NLP）的监督培训至关重要。通常，在特定于域的上下文中开发和应用新的NLP管道通常需要定制设计的数据集来以监督机器学习方式解决NLP任务。当使用非英语语言进行医学数据处理时，这会暴露出几个次要和主要的相互联系的问题，例如缺乏任务匹配数据集以及特定于任务的预训练模型。在我们的工作中，我们建议利用审计的语言模型来培训数据获取，以便检索足够大的数据集，以训练更小，更有效的模型，以便使用特定的特定任务。为了证明您的方法的有效性，我们创建了一个自定义数据集，我们用来培训用于德国文本的医学模型，但在原则上我们的方法仍然不依赖语言。我们获得的数据集以及我们的预培训模型可在以下网址公开获取：https：//github.com/frankkramer-lab/gptnermed

translated by 谷歌翻译

Quantitative Universal Approximation Bounds for Deep Belief Networks

Julian Sieber , Johann Gehringer

分类： (统计)机器学习 | 机器学习

2022-08-18

我们表明，具有二进制隐藏单元的深度信念网络可以在可见节点的父母密度上近似于任何多元概率密度。近似值以$ l^q $ -norm为$ q \ in [1，\ infty] $（$ q = \ infty $，对应于最高标准）和kullback-leibler Divergence。此外，我们根据隐藏单元数量在近似误差上建立了尖锐的定量界限。

translated by 谷歌翻译